Using the stephen_curry_shotdata_2014_15.txt dataset replicate, as close as possible, the graphics below. After replicating the graphics provide a summary of what the graphics indicate about Stephen Curry’s shot selection (i.e. distance from hoop) and shot make/miss rate and how they relate/compare across distance and game time (i.e. across quarters/periods).
Plot 1
Hints:
Figure width 6 inches and height 4 inches, which is taken care of in code chunk yaml with fig-width and fig-height
Use minimal theme and adjust from there
While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 12 & 14
Figure width 6 inches and height 4 inches, which is taken care of in code chunk yaml with fig-width and fig-height
Use minimal theme and adjust from there
Useful hex colors: "#5D3A9B" and "#E66100"
No padding on vertical axis
Transparency is being used
annotate() is used to add labels
While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0, 0.04, 0.07, 0.081, 0.25, 3, 12, 14, 27
Figure width 7 inches and height 7 inches, which is taken care of in code chunk yaml with fig-width and fig-height
Colors used: "grey", "red", "orange""yellow" (don’t have to use "orange", you can get away with using only "red" and "yellow")
To top code so 15+ is the highest value, you need to set the limits in the appropriate scale while also also setting the na.value to the top color
While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0, 0.7, 5, 12, 14, 15, 20
Provide a summary of what the graphics above indicate about Stephen Curry’s shot selection (i.e. distance from hoop) and shot make/miss rate and how they relate/compare across distance and game time (i.e. across quarters/periods).
Solution
Based on plot 1, we see that Stephen Curry takes the most shots in the first and third quarters when he has the most energy. Additionally and unsuprisingly, the median make distances are lower than than the median miss distances. With the exception of overtime, the distribution of both made shots and missed shots is similar across the game. Stephen Curry’s shot selection is more careful during overtime as the shot distance distributions are noticeably lower during this time period, although the sample size in overtime is pretty small.
Based on plot 2, Stephen Curry’s shot distribution (for both makes and misses) is bimodal. For both makes and misses, the majority of his shots come near the rim or from behind the three point line and not the mid-range. Unsurprisingly, he has more makes than misses near the basket and more misses than makes from deep. However, it is interesting to note that both distributions peak at the three point line, indicating that a majority of Curry’s shots come from this range.
Plot 3 illustrates plot 2’s distribution by breaking down Curry’s shot attempts into regions on the court. Specifically, a lot of his attempts come at the basket, or at the top of the three point line/above the break. This indicates he takes a lot of fastbreak threes and generally has the ball in his hands a lot as most of his threes do not come from the corner where most players wait in catch-and-shoot position.
Exercise 2
Using the ga_election_data.csv dataset in conjunction with mapping data ga_map.rda replicate, as close as possible, the graphic below. Note the graphic is comprised of two plots displayed side-by-side. The plots both use the same shading scheme (i.e. scale limits and fill options).
After replicating the graphic provide a summary of how the two maps relate to one another. That is, what insight can we learn from the graphic.
Hints:
Figure width 7 inches and height 7 inches, which is taken care of in code chunk yaml with fig-width and fig-height
Make two plots, then arrange plots accordingly using patchwork package
patchwork::plot_annotation() will be useful for adding graphic title and caption; you’ll also set the theme options for the graphic title and caption (think font size and face)
ggthemes::theme_map() was used as the base theme for the plots
scale_*_gradient2() will be helpful
Useful hex colors: "#5D3A9B" and "#1AFF1A"
While the plot needs to be very close to the one shown it does not need to be exact in terms of values. If you want to make it exact here are some useful values used, sometimes repeatedly, to make the plot: 0.5, 0.75, 1, 10, 12, 14, 24
Solution
Code
# dataga_graph <- ga_data %>%# creating new column that contains the proportion of early voting for each candidate in each Georgia countymutate(prop_pre_eday = (absentee_by_mail_votes + advanced_voting_votes) / total_votes ) %>%# filter out all columns with "_vote" in the titleselect(-contains("_vote")) # biden map databiden_map_data <- ga_map %>%# filtering ga_graph to include Joe Biden data and using left join to combine ga_map and ga_graph, new dataset has all of ga_map's columns as well as "candidate" and "prop_pre_eday"left_join( ga_graph %>%filter(candidate =="Joseph R. Biden"),by =c("name"="county") )# trump map datatrump_map_data <- ga_map %>%# doing the exact same thing as above except filtering ga_graph to only include Donald Trump dataleft_join( ga_graph %>%filter(candidate =="Donald J. Trump"),by =c("name"="county") )
Provide a summary of how the two maps relate to one another. That is, what insight can we learn from the graphic.
Solution
Based on these maps, there is a significant partisan divide bewteen Democrat and Republican voters and their preference for early voting and/or mail-in voting. For the majority of Georgia counties, a greater proportion of Biden voters participated in early/mail-in voting when compared to Trump voters. This is shown by the greater presence of purple shading on the map using the data of voters who voted for President Biden.
Exercise 3
One thing that makes using ggplot2 in R so powerful is the ever expanding set of extension packages. At this point, students should have the skills and know how to be able to make use of extension packages. The focus of this exercise is for students to demonstrate their ability to make use of a ggplot2 extension package.
For this exercise you will need the penguins dataset from the palmerpenguins package. The extension package is ggside. A little internet searching and you should be able to find the documentation for ggside and its GitHub` repo with very useful examples.
Recreate the graphic below as closely as possible
Hints:
alpha values: 0.6 and 0.3
scale for the side panels: 0.3
should be able to track down an example that is very similar to this